Unsupervised Domain Adaptation for Word Sense Disambiguation using Stacked Denoising Autoencoder
نویسندگان
چکیده
In this paper, we propose an unsupervised domain adaptation for Word Sense Disambiguation (WSD) using Stacked Denoising Autoencoder (SdA). SdA is an unsupervised learning method of obtaining the abstract feature set of input data using Neural Network. The abstract feature set absorbs the difference of domains, and thus SdA can solve a problem of domain adaptation. However, SdA does not always cope with any problems of domain adaptation. Especially, difficulty of domain adaptation for WSD depends on the combination of a source domain, a target domain and a target word. As a result, any method of domain adaptation for WSD has adverse effect for a part of the problem, Therefore, we defined the similarity between two domains, and judge whether we use SdA or not through this similarity. This approach avoids an adverse effect of SdA. In the experiments, we have used three domains from the Balanced Corpus of Contemporary Written Japanese and 16 target words. In comparison with baseline, our method has got higher average accuracies for all combinations of two domains. Furthermore, we have obtained better results against conventional domain adaptation methods.
منابع مشابه
Deep Nonlinear Feature Coding for Unsupervised Domain Adaptation
Deep feature learning has recently emerged with demonstrated effectiveness in domain adaptation. In this paper, we propose a Deep Nonlinear Feature Coding framework (DNFC) for unsupervised domain adaptation. DNFC builds on the marginalized stacked denoising autoencoder (mSDA) to extract rich deep features. We introduce two new elements to mSDA: domain divergence minimization by Maximum Mean Dis...
متن کاملUse of Combined Topic Models in Unsupervised Domain Adaptation for Word Sense Disambiguation
Topic models can be used in an unsupervised domain adaptation for Word Sense Disambiguation (WSD). In the domain adaptation task, three types of topic models are available: (1) a topic model constructed from the source domain corpus: (2) a topic model constructed from the target domain corpus, and (3) a topic model constructed from both domains. Basically, three topic features made from each to...
متن کاملMarginalized Stacked Denoising Autoencoders
Stacked Denoising Autoencoders (SDAs) [4] have been used successfully in many learning scenarios and application domains. In short, denoising autoencoders (DAs) train one-layer neural networks to reconstruct input data from partial random corruption. The denoisers are then stacked into deep learning architectures where the weights are fine-tuned with back-propagation. Alternatively, the outputs...
متن کاملLearning Entity Representation for Entity Disambiguation
We propose a novel entity disambiguation model, based on Deep Neural Network (DNN). Instead of utilizing simple similarity measures and their disjoint combinations, our method directly optimizes document and entity representations for a given similarity measure. Stacked Denoising Auto-encoders are first employed to learn an initial document representation in an unsupervised pre-training stage. ...
متن کاملHIT-CIR: An Unsupervised WSD System Based on Domain Most Frequent Sense Estimation
This paper presents an unsupervised system for all-word domain specific word sense disambiguation task. This system tags target word with the most frequent sense which is estimated using a thesaurus and the word distribution information in the domain. The thesaurus is automatically constructed from bilingual parallel corpus using paraphrase technique. The recall of this system is 43.5% on SemEv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015